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Abstract 

We propose an original particle-based implementation of the Loopy Belief Prop¬ 
agation (LPB) algorithm for pairwise Markov Random Fields (MRF) on a con¬ 
tinuous state space. The algorithm constructs adaptively efficient proposal distri¬ 
butions approximating the local beliefs at each note of the MRF. This is achieved 
by considering proposal distributions in the exponential family whose parameters 
are updated iterately in an Expectation Propagation (EP) framework. The pro¬ 
posed particle scheme provides consistent estimation of the LBP marginals as the 
number of particles increases. We demonstrate that it provides more accurate re¬ 
sults than the Particle Belief Propagation (PBP) algorithm of [1] at a fraction of 
the computational cost and is additionally more robust empirically. The compu¬ 
tational complexity of our algorithm at each iteration is quadratic in the number 
of particles. We also propose an accelerated implementation with sub-quadratic 
computational complexity which still provides consistent estimates of the loopy 
BP marginal distributions and performs almost as well as the original procedure. 


1 Introduction 

Undirected Graphical Models (also known as Markov Random Fields) provide a flexible framework 
to represent networks of random variables and have been used in a large variety of applications in 
machine learning, statistics, signal processing and related fields [2]. For many applications such 
as tracking, sensor networks or imaging [1, 3], it can be beneficial to define MRF on continuous 
state-spaces. 

Given a pairwise MRF, we are here interested in computing the marginal distributions at the nodes 
of the graph. A popular approach to do this is to consider the Loopy Belief Propagation (LBP) algo¬ 
rithm [4, 5, 2], LBP relies on the transmission of messages between nodes. However when dealing 
with continuous random variables, computing these messages exactly is generally intractable. In 
practice, one must select a way to tractably represent these messages and a way to update these 
representations following the LBP algorithm. The Nonparametric Belief Propagation (NBP) algo¬ 
rithm [6] represents the messages with mixtures of Gaussians while the Particle Belief Propagation 
(PBP) algorithm [1] uses an importance sampling approach. NBP relies on restrictive integrability 
conditions and does not offer consistent estimators of the LBP messages. PBP offers a way to cir¬ 
cumvent these two issues but the implementation suggested proposes sampling from the estimated 
beliefs which need not be integrable. Moreover, even when they are integrable, sampling from 
the estimated beliefs is very expensive computationally. Practically the authors of [1] only sample 
approximately from them using short MCMC runs, leading to biased estimators. 

In our method, we consider a sequence of proposal distributions at each node from which one can 
sample particles at a given iteration of the LBP algorithm. The messages are then computed using 
importance sampling. The novelty of the approach is to propose a principled and automated way of 
designing a sequence of proposals in a tractable exponential family using the Expectation Propaga- 
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tion framework [7]. The resulting algorithm, which we call Expectation Particle Belief Propagation 
(EPBP), does not suffer from restrictive integrability conditions and sampling is done exactly which 
implies that we obtain consistent estimators of the LBP messages. The method is empirically shown 
to yield better approximations to the LBP beliefs than the implementation suggested in [1], at a 
much reduced computational cost, and than EP. 

2 Background 

2.1 Notations 

We consider a pairwise MRF, i.e. a distribution over a set of p random variables indexed by a set 
V = p}, which factorises according to an undirected graph G = (V, E) with 


p(xv) oc JJ ilt„(x u ) il> uv {x u ,x v ). 

uGV (u,i i)6E 
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The random variables are assumed to take values on a continuous, possibly unbounded, space X. 
The positive functions if> u : X >-> R + and ip uv : X x X ^ R + are respectively known as the node 
and edge potentials. The aim is to approximate the marginals p u {x u ) for all u G V. A popular 
approach is the LBP algorithm discussed earlier. This algorithm is a fixed point iteration scheme 
yielding approximations called the beliefs at each node [4, 2], When the underlying graph is a tree, 
the resulting beliefs can be shown to be proportional to the exact marginals. This is not the case in 
the presence of loops in the graph. However, even in these cases, LBP has been shown to provide 
good approximations in a wide range of situations [8, 5], The LBP fixed-point iteration can be 
written as follows at iteration t: 


m^ixy) = / il>uv(x u ,x v )i> u (x u ) m^^x^dx. 

J - .. y— T> \ 
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wGT u \v 
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where L„ denotes the neighborhood of u i.e., the set of nodes {zu | (w, u) G E}, m uv is known as 
the message from node u to node v and B u is the belief at node u. 

2.2 Related work 


The crux of any generic implementation of LBP for continuous state spaces is to select a way to rep¬ 
resent the messages and design an appropriate method to compute/approximate the message update. 

In Nonparametric BP (NBP) [6], the messages are represented by mixtures of Gaussians. In theory, 
computing the product of such messages can be done analytically but in practice this is impractical 
due to the exponential growth in the number of terms to consider when calculating the product of 
mixtures. To circumvent this issue, the authors suggest an importance sampling approach targeting 
the beliefs and fitting mixtures of Gaussians to the resulting weighted particles. The computation of 
the update (2) is then always done over a constant number of terms. 

A restriction of “vanilla” Nonparametric BP is that the messages must be finitely integrable for the 
message representation to make sense. This is the case if the following two conditions hold: 


sup 


J f’ uv (x u ,x v )dx u < oc, and J i/j u (x u ) da;* 


< oo. 


(3) 


These conditions do however not hold in a number of important cases as acknowledged in [3], For 
instance, the potential ip u (x u ) is usually proportional to a likelihood of the form p(y u \x u ) which 
needs not be integrable in x u . Similarly, in imaging applications for example, the edge potential 
can encode similarity between pixels which also need not verify the integrability condition as in [9], 
Further, NBP does not offer consistent estimators of the LBP messages. 

Particle BP (PBP) [1] offers a way to overcome the shortcomings of NBP: the authors also consider 
importance sampling to tackle the update of the messages but without fitting a mixture of Gaussians. 
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For a chosen proposal distribution q u on node u and a draw of N particles {xu^}fLi ~ q u {x u ), the 
messages are represented as mixtures: 

- n (4) 

/ wGT u \v 

This algorithm has the advantage that it does not require the conditions (3) to hold. The authors 
suggest two possible choices of sampling distributions: sampling from the local potential f> u , or 
sampling from the current belief estimate. The first case is only valid if ip u is integrable w.r.t. x u 
which, as we have mentioned earlier, might not be the case in general and the second case implies 
sampling from a distribution of the form 

Bl BP (x u ) oc ip u {x u ) m™(x u ) (5) 

w£T u 

which is a product of mixtures. As in NBP, naive sampling of the proposal has complexity 0{N' [Tu I) 
and is thus in general too expensive to consider. Alternatively, as the authors suggest, one can run 
a short MCMC simulation targeting it which reduces the complexity to order O (j F„ | N 2 ) since the 
cost of each iteration, which requires evaluating F> PBP point-wise, is of order C?(|T u |Ai), and we 
need 0(N) iterations of the MCMC simulation. The issue with this approach is that it is still com¬ 
putationally expensive, and it is unclear how many iterations are necessary to get N good samples. 

2.3 Our contribution 

In this paper, we consider the general context where the edge and node-potentials might be non- 
normalizable and non-Gaussian. Our proposed method is based on PBP, as PBP is theoretically 
better suited than NBP since, as discussed earlier, it does not require the conditions (3) to hold, and, 
provided that one samples from the proposals exactly, it yields consistent estimators of the LBP 
messages while NBP does not. Further, the development of our method also formally shows that 
considering proposals close to the beliefs, as suggested by [1], is a good idea. Our core observation 
is that since sampling from a proposal of the form (5) using MCMC simulation is very expensive, 
we should consider using a more tractable proposal distribution instead. However it is important that 
the proposal distribution is constructed adaptively, taking into account evidence collected through 
the message passing itself, and we propose to achieve this by using proposal distributions lying in a 
tractable exponential family, and adapted using the Expectation Propagation (EP) framework [7]. 

3 Expectation Particle Belief Propagation 

Our aim is to address the issue of selecting the proposals in the PBP algorithm. We suggest using 
exponential family distributions as the proposals on a node for computational efficiency reasons, 
with parameters chosen adaptively based on current estimates of beliefs and EP. Each step of our 
algorithm involves both a projection onto the exponential family as in EP, as well as a particle 
approximation of the LBP message, hence we will refer to our method as Expectation Particle 
Belief Propagation or EPBP for short. 

For each pair of adjacent nodes u and v, we will use m uv {x v ) to denote the exact (but unavailable) 
LBP message from u to v, fh uv (x v ) to denote the particle approximation of m uv , and rj uv an expo¬ 
nential family projection of m uv . In addition, let r] ou denote an exponential family projection of the 
node potential ip u . We will consider approximations consisting of N particles. In the following, we 
will derive the form of our particle approximated message rh uv (x v ), along with the choice of the 
proposal distribution q u {x u ) used to construct m uv . Our starting point is the edge-wise belief over 
x u and x v , given the incoming particle approximated messages, 

B uv (x u ,x v ) <xip uv (x u ,x v )ip u (x u )ip v (x v ) fh wu (x u ) m vv (x v ). (6) 

w(zT u \v v(zF v \u 

The exact LBP message rn uv (x v ) can be derived by computing the marginal distribution B uv (x v ), 
and constructing m uv (x v ) such that 

Buv(xy') oc tn uv {^xf)^I vu {^x v f (7) 
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where M vu (x v ) = ip v {x v ) Ylver \u ™'w(x v ) is the (particle approximated) pre-message from v to 
u. It is easy to see that the resulting message is as expected, 

m uv (x v ) oc ipuv(xu,x v )ip u {x u ) fh wu {x u )dx u . (8) 

w£T u \v 

Since the above exact LBP belief and message are intractable in our scenario of interest, the idea 
is to use an importance sampler targeting B uv (x u ,x v ) instead. Consider a proposal distribution 
of the form q u (x u )q v (x v ). Since x u and x v are independent under the proposal, we can draw 
N independent samples, say {xu'jfl;! and from q u and q v respectively. We can then 

approximate the belief using a N x TV cross product of the particles, 


N 


B uv (x u , x v ) ~ 2 'y ' 


B U V 


\x{^) 


N2 6^1 qn{x { u)q v {Xv ) ) 
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Marginalizing onto x v , we have the following particle approximation to B uv (x v ), 


b uv (x v ) « 'y) 


N 


i=l 


Tfl , uv( < 3'V ) ) 
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q v {xv) 


Sm (x v ) 


( 10 ) 


where the particle approximated message fh uv {x v ) from u to v has the form of the message repre¬ 
sentation in the PBP algorithm (4). 


To determine sensible proposal distributions, we can find q u and q v that are close to the target B uv . 
Using the KL divergence KL(B uv \\q u q v ) as the measure of closeness, the optimal q u required for the 
u to v message is the node belief, 

B uv {x u ) oc ip u (x u ) fh wu {x u ) (11) 

w(zT u 

thus supporting the claim in [1] that a good proposal to use is the current estimate of the node belief. 
As pointed out in Section 2, it is computationally inefficient to use the particle approximated node 
belief as the proposal distribution. An idea is to use a tractable exponential family distribution for 
q u instead, say 

q u {x u ) oc 

Vou n '11wu{Xvl) (12) 

w£T u 

where q au and rj wu are exponential family approximations of ip u and fh wu respectively. In Section 
4 we use a Gaussian family, but we are not limited to this. Using the framework of expectation 
propogation (EP) [7], we can iteratively find good exponential family approximations as follows. 
For each w e F^, to update the q wu , we form the cavity distribution qu W oc q u /t] wu and the corre¬ 
sponding tilted distribution rh wu (p^’. The updated is the exponential family factor minimising 
the KL divergence, 

Vwu = ar S min KL \ m wu (x u )q\ w (x u )\\v(x u )ql w (x u ) . (13) 

77£exp.fam. L II J 

Geometrically, the update projects the tilted distribution onto the exponential family manifold. 
The optimal solution requires computing the moments of the tilted distribution through numeri¬ 
cal quadrature, and selecting q wu so that q wu qu W matches the moments of the tilted distribution. In 
our scenario the moment computation can be performed cmdely on a small number of evaluation 
points since it only concerns the updating of the IS proposal. If an optimal q in the exponential 
family does not exist, e.g. in the Gaussian case that the optimal q has a negative variance, we simply 
revert q wu to its previous value [7]. An analogous update is used for q ou . 


In the above derivation, the expectation propagation steps for each incoming message into u and for 
the node potential are performed first, to fit the proposal to the current estimated belief at u, before 
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it is used to draw N particles, which can then be used to form the particle approximated messages 
from u to each of its neighbours. Alternatively, once each particle approximated message rh uv (x v ) 
is formed, we can update its exponential family projection q uv {x v ) immediately. This alternative 
scheme is described in Algorithm 1. 

Algorithm 1 Node update 

1: sample {xu } } ~ q u { ■) 

2: compute B u (x^) = ip u (x n„, 6 r u fhwu{xu ] ) 

3: for v G T,, do 

4: compute M uv (x$) :&■ B u (x$)/m vu (x$) 

5: compute the normalized weights Wul oc M uv (xu^) / q u (xu' > ) 

6: update the estimator of the outgoing message m uv (x v ) = w uvil’uv(xu\ x v ) 

7: compute the cavity distribution q}° oc q v /r) ov , get r/+, in the exponential family such that 

rjt v q)° approximates -ip v qv °, update q v oc ry+ and let rj ov 3- r]+ v 
8: compute the cavity distribution q), u oc q v /r) uv , get rj^ v in the exponential family such that 

r lu V qv" approximates fh uv q^ u 
9: end for 


3.1 Computational complexity and sub-quadratic implementation 

Each EP projection step costs O(N) computations since the message fh wu is a mixture of N compo¬ 
nents (see (4)). Drawing N particles from the exponential family proposal q u costs 0{N). The step 
with highest computational complexity is in evaluating the particle weights in (4). Indeed, evaluating 
the mixture representation of a message on a single point is O(N), and we need to compute this for 
each of N particles. Similarly, evaluating the estimator of the belief on N sampling points at node 
u requires 0(|r u |7V 2 ). This can be reduced since the algorithm still provides consistent estimators 
if we consider the evaluation of unbiased estimators of the messages instead. Since the messages 
have the form m uv (x v ) = J2iLi w uv' t l J uv( x v )> we can follow a method presented in [10] where 
one draws M indices from a multinomial with weights and evaluates the corre¬ 

sponding M components ipuv- This reduces the cost of the evaluation of the beliefs to 0(\T u \MN) 
which leads to an overall sub-quadratic complexity if M is o(N). We show in the next section how 
it compares to the quadratic implementation when M = O (log TV). 

4 Experiments 

We investigate the performance of our method on MRFs for two simple graphs. This allows us 
to compare the performance of EPBP to the performance of PBP in depth. We also illustrate the 
behavior of the sub-quadratic version of EPBP. Finally we show that EPBP provides good results in 
a simple denoising application. 

4.1 Comparison with PBP 

We start by comparing EPBP to PBP as implemented by Ihler et al. on a 3 x 3 grid (figure 1) 
with random variables taking values on R. The node and edge potentials are selected such that the 
marginals are multimodal, non-Gaussian and skewed with 

f i>u{xu) = o‘iJ^(x u -y u ;-2,l) + a 2 g(x u ^y u ;2,1.3) 14 > 

‘ipuv(x u , x v ) — £(x u x v , 0,2) 

where y u denotes the observation at node u, M(x; /z, cr) oc exp(— x 2 /2a 2 ) (density of a Normal 
distribution), Q(x;n : /3) oc exp(— (x — ^i)//3 + exp (—(x — /i)//3)) (density of a Gumbel distribution) 
and C(x\ n, f3) oc exp (—\x — qi\/$) (density of a Laplace distribution). The parameters ai and «2 
are respectively set to 0.6 and 0.4. We compare the two methods after 20 LBP iterations. 1 

'The scheduling used alternates between the classical orderings: top-down-left-right, left-right-top-down, 
down-up-right-left and right-left-down-up. One “LBP iteration’’ implies that all nodes have been updated once. 
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Figure 1: Illustration of the grid (left) and tree (right) graphs used in the experiments. 


PBP as presented in [1] is implemented using the same parameters than those in an implementation 
code provided by the authors: the proposal on each node is the last estimated belief and sampled with 
a 20-step MCMC chain, the MH proposal is a normal distribution. For EPBP, the approximation of 
the messages are Gaussians. The ground tmth is approximated by mnning LBP on a deterministic 
equally spaced mesh with 200 points. All simulations were run with Julia on a Mac with 2.5 GHz 
Intel Core i5 processor, the code is available online. 2 

Figure 2 compares the performances of both methods. The error is computed as the mean L 1 error 
over all nodes between the estimated beliefs and the ground truth evaluated over the same deter¬ 
ministic mesh. One can observe that not only does PBP perform worse than EPBP but also that 
the error plateaus with increasing number of samples. This is because the sampling within PBP is 
done approximately and hence the consistency of the estimators is lost. The speed-up offered by 
EPBP is very substantial (figure 4 left). Hence, although it would be possible to use more MCMC 
((Metropolis-Hastings) iterations within PBP to improve its performance, it would make the method 
prohibitively expensive to use since every MCMC step has quadratic cost. Note that for EPBP, one 
observes the usual 1 /yfN convergence of particle methods. 

Figure 3 compares the estimator of the beliefs obtained by the two methods for three arbitrarily 
picked nodes (node 1, 5 and 9 as illustrated on figure 1). The figure also illustrates the last proposals 
constructed with our approach and one notices that their supports match closely the support of the 
true beliefs. Figure 4 left illustrates how the estimated beliefs converge as compared to the true 
beliefs with increasing number of iterations. One can observe that PBP converges more slowly and 
that the results display more variability which might be due to the MCMC runs being too short. 

We repeated the experiments on a tree with 8 nodes (figure 1 right) where we know that, at con¬ 
vergence, the beliefs computed using BP are proportional to the tme marginals. The node and edge 
potentials are again picked such that the marginals are multimodal with 



a\N(x u - y u - -2,1) + a 2 Af{x u - y u ; 1,0.5) 
^{x u x Vl 0,1) 


(15) 


with ai = 0.3 and a 2 = 0.7. On this example, we also show how “pure EP” with normal distribu¬ 
tions performs. We also try using the distributions obtained with EP as proposals for PBP (referred 
to as “PBP after EP” in figures). Both methods underperform compared to EPBP as illustrated vi¬ 
sually in Figure 5. In particular one can observe in Figure 3 that “PBP after EP” converges slower 
than EPBP with increasing number of samples. 

4.2 Sub-quadratic implementation and denoising application 

As outlined in Section 3.1, in the implementation of EPBP one can use an unbiased estimator of 
the edge weights based on a draw of M components from a multinomial. The complexity of the 
resulting algorithm is O(MN). We apply this method to the 3 x 3 grid example in the case where 
M is picked to be roughly of order log(iV): i.e., for N = {10,20,50,100,200,500}, we pick 
M = {5, 6,8,10,11,13}. The results are illustrated in Figure 6 where one can see that the N log N 
implementation compares very well to the original quadratic implementation at a much reduced 
cost. We apply this sub-quadratic method on a simple probabilistic model for an image denoising 
problem. The aim of this example is to show that the method can be applied to larger graphs and still 
provide good results. The model underlined is chosen to showcase the flexibility and applicability of 
our method in particular when the edge-potential is non-integrable and is not claimed to be optimal. 
The node and edge potentials are defined as follows: 


( i’ u {x u ) 

\ ^Puv {,Xu 7 X v ) 


A f(x u - y u \ 0,0.1) 
C x {x u — x v \ 0, 0.03) ’ 


( 16 ) 


2 https://github.com/tlienart/EPBP 
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where £ A (x;/z,/3) = £(x;/x,/?) if |x| < A and £(A;/z,/3) otherwise. In this example we set 
A = 0.2. The value assigned to each pixel of the reconstruction is the estimated mean obtained over 
the corresponding node (figure 7). The image has size 50 x 50 and the simulation was run with 
N = 30 particles per nodes, M = 5 and 10 BP iterations taking under 2 minutes to complete. We 
compare it with the result obtained with EP on the same model. 




Figure 2: (left) Comparison of the mean L 1 error for PBP and EPBP for the 3 x 3 grid example, 
(right) Comparison of the mean L 1 error for “PBP after EP” and EPBP for the tree example. In both 
cases, EPBP is more accurate for the same number of samples. 





Figure 3: Comparison of the beliefs on node 1, 5 and 9 as obtained by evaluating LBP on a deter¬ 
ministic mesh (true belief), with PBP and with EPBP for the 3 x 3 grid example. The proposal used 
by EPBP at the last step is also illustrated. The results are obtained with N = 100 samples on each 
node and 20 BP iterations. One can observe visually that EPBP outperforms PBP. 




Figure 4: (left) Comparison of the convergence in L 1 error with increasing number of BP iterations 
for the 3 x 3 grid example when using IV = 30 particles, (right) Comparison of the wall-clock time 
needed to perform PBP and EPBP on the 3 x 3 grid example. 

5 Discussion 

We have presented an original way to design adaptively efficient and easy-to-sample-from proposals 
for a particle implement of Loopy Belief Propagation. Our proposal is inspired by the Expectation 
Propagation framework. 

We have demonstrated empirically that the resulting algorithm is significantly faster and more accu¬ 
rate than an implementation of PBP using the estimated beliefs as proposals and sampling from them 
using MCMC as proposed in [1]. It is also more accurate than EP due to the nonparametric nature 
of the messages and offers consistent estimators of the LBP messages. A sub-quadratic version of 
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the method was also outlined and shown to perform almost as well as the original method, it was 
also applied successfully in an image denoising example. 

We believe that our method could be applied successfully to a wide range of applications such as 
smoothing for Hidden Markov Models [11], tracking or computer vision [12, 13]. In future work, 
we will look at considering other divergences than the KL and the “Power EP” framework [14], we 
will also look at encapsulating the present algorithm within a sequential Monte Carlo framework 
and the recent work of Naesseth et al. [15]. 





Figure 5: Comparison of the beliefs on node 1, 3 and 8 as obtained by evaluating LBP on a deter¬ 
ministic mesh, using EPBP, PBP, EP and PBP using the results of EP as proposals. This is for the 
tree example with N = 100 samples on each node and 20 LBP iterations. Again, one can observe 
visually that EPBP outperforms the other methods. 




Figure 6: Comparison of the mean L 1 error for PBP and EPBP on a 3 x 3 grid (left). For the 
same number of samples, EPBP is more accurate. It is also faster by about two orders of magnitude 
(right). The simulations were run several times for the same observations to illustrate the variability 
of the results. 



Figure 7: From left to right: comparison of the original (first), noisy (second) and recovered image 
using the sub-quadratic implementation of EPBP (third) and with EP (fourth). 
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